Cell‐free RNA and fully convolutional dense network‐based early preeclampsia prediction

Dear Editor, We propose a fully convolutional dense network (FCDN) model1,2 to predict preeclampsia (PE) with circulating cellfree RNA (cfRNA).3–5 The Individual Risk Score (IRS) output of the proposed FCDN model contributes to the literature on consistently monitoring the risk of PE, evaluating the effect of prophylactic treatments, and providing accurate as well as rapid screening and diagnosis of PE in populous developing countries with a high incidence of PE, such as China. PE is a pregnancy-specific hypertensive disorder and leads to 10.2 deaths per 100 000 pregnancies,6–8 reaming the second death cause of pregnant women in China.

Dear Editor, We propose a fully convolutional dense network (FCDN) model 1,2 to predict preeclampsia (PE) with circulating cellfree RNA (cfRNA). [3][4][5] The Individual Risk Score (IRS) output of the proposed FCDN model contributes to the literature on consistently monitoring the risk of PE, evaluating the effect of prophylactic treatments, and providing accurate as well as rapid screening and diagnosis of PE in populous developing countries with a high incidence of PE, such as China.
PE is a pregnancy-specific hypertensive disorder and leads to 10.2 deaths per 100 000 pregnancies, [6][7][8] reaming the second death cause of pregnant women in China.
Diagnoses of PE are still regularly missed or delayed and predicting PE in early gestation remains challenging. The trained network is designed to predict PE risk in terms of IRS according to variations in personal cfRNA profiling in early pregnancy ( Figure 1A). For the first step, standardized and cleaned cfRNA sequencing data from normal pregnancy (NP) and PE were downloaded from GSE192902 5 in Gene Expression Omnibus, which were col-This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. lected ≤12 gestational weeks (gws) or at 13-20 gws. A total of 7160 detected cfRNAs were filtered to select cfRNAs with significant changes that could be used as indicators of PE risk. As illustrated in Figure 1B, we used multiple tests to optimize the parameters of the algorithm, a total of 29 cfRNAs were chosen as PE indicators for samples sequenced at ≤12 gws, and 25 cfRNAs were selected as samples sequenced at 13-20 gws ( Table S1).
Given that neural network models require large datasets to perform training, we analyzed the rates of change of real-world cfRNA profiling data 5 and generated a synthetic dataset to train the prediction model based on the Gaussian function given below as Equation (1).
where N = s; rands () is the Gaussian random function, Max () and Min () are the maximum and minimum value functions, respectively, and M and Q are the numbers to be produced.
Therefore, the vector set of cfRNA contributions (y_train and y_test) can be calculated from x_train, x_test, and clinical diagnosis (prior knowledge). This process is performed as given in Equations (2) and (3).
where N = s, avg () is the average value function. Dataset for FCDN model training and validation is shown in Supplementary Table 2.
In fact, only one clinical diagnosis conclusion was available for any enrolled woman: NP or PE. Therefore, we define that women in the PE group have the maximum IRS = 1 and NP have minimum IRS = 0. Based on the sequencing cfRNA from enrolled women, we can calculate their IRS using Equations (2) and (3). Next, we calculated the IRS at that sampling time. Calculated IRS is regarded as the ground truth for the enrolled women. At a sampling time ≤12 gws, the average of calculated IRS was 0.27 and 0.47 in NP and PE group, respectively ( Figure 1C). At a sampling time of 13-20 gws, the average calculated IRS of NP and PE was 0.39 and 0.57, respectively ( Figure 1D). The average calculated IRS for the NP group differed notably from that of the PE group (Supplementary Figure 1). The results suggest that the filtered cfRNA indicators work well to distinguish NP from PE and support the application of the proposed model in clinical practice.
Next, we used an FCDN model to perform data regression ( Figure 1E). The procedure of FCDN on PE prediction is shown in Figure 1F. In the current study, different datasets were used for model training and validation. The

F I G U R E 3
Cell-free RNA and fully convolutional dense network-based early preeclampsia prediction. This study developed a deep learning algorithm to evaluate Individual Risk Score (IRS) for pregnant women using cell-free RNA (cfRNA) profiling; the IRS output of the proposed FCDN model contributes to the literature on consistently monitoring the risk of PE, evaluating the effect of prophylactic treatments, and providing accurate as well as rapid screening and diagnosis of PE in populous developing countries. In current study, different datasets were used for model training and validation. The dataset in GSE192902 were divided into Discover Cohort, Validation 1 Cohort, and Validation 2 Cohort. For model training, Validation 2 Cohort (87 sets of real-world cfRNA profiles) and 7913 computer-generated cfRNA profiles were employed. For model validation, 1000 computer-generated cfRNA profiles were employed. For the final model validation (application), we used Discover Cohort, and Validation 1 Cohort, which include 215 sets of real-world cfRNA profiles. NP, normal pregnancy; PE, preeclamptic pregnancy; cfRNA, circulating cell-free RNA. IRS, Individual Risk Score. dataset in GSE192902 was divided into Discover Cohort, Validation 1 Cohort and Validation 2 Cohort. For model training, Validation 2 Cohort (87 sets of real-world cfRNA profiles) and 7913 computer-generated cfRNA profiles were employed. For model validation, 1000 computergenerated cfRNA profiles were employed. For the final model validation (application), we used Discover Cohort, and Validation 1 Cohort, which include 215 sets of realworld cfRNA profiles. A more detailed method for FCND construction, training and validation is shown in the Supporting Information. Through FCDN model training and validation (Figure 2A,B), the loss value (mean absolute error [MAE]) of the probability prediction decreased to 0.027, and an optimized model could then be obtained. To validate the prediction accuracy of the model, we used cfRNA expression from the real world 5 as the input set x_test for the FCDN model to obtain the FCDN-based IRS. Furthermore, the FCDN-based IRS was compared with the ground truth (calculated IRS) calculated from real-world cfRNA profiling. At a sampling time of ≤12 gws, the tendency and amplitude of the FCDN-based IRS (prediction results) resembled the ground truth, suggesting the fitting ability of our FCDN model ( Figure 2C). The MAE between the prediction result and the ground truth was only 0.032. PE and NP can be separated using averaged FCDN-based IRS. We also calculated the FCDN-based IRS for samples enrolled 13-20 gws ( Figure 2D). Over the whole scale, the prediction results approximate the ground truth. The MAE between the prediction result and ground truth was only 0.041, indicating that the FCDN model predicted the ground truth well.
The error amplitude of the FCDN-based IRS in processing cfRNA samples ≤12 gws is shown in Figure 2E. The maximum value of the absolute error, the peak-to-valley (PV) value of the error, and the mean value of the absolute error were 0.12, 0.16 and 0.046, respectively. For samples within 13−20 gws ( Figure 2F), the maximum value of the absolute error reached 0.16, and the PV value of the error reached 0.27. The mean absolute error was 0.008. In short, the prediction error for IRS was well-controlled within a small amplitude, and the FCDN model was able to fit the data well.
We also considered processing efficiency when dealing with numerous datasets collected from population screening. Therefore, the prediction time efficiency was also used as another benchmark to evaluate the method. In this test, the cfRNA profiling samples were fed into the trained FCDN model, and the time required to output an IRS value was recorded. As shown in Figure 2G,H, the results of 15 consecutive experiments showed that the average time required to output an IRS reached 10 −5 s per sample.
In summary, we employed novel biomarker cfRNAs and an FCDN model to output an IRS to predict PE. The prediction accuracy and computational time of the proposed model reached 0.95 and 10 −5 s per sample, respectively. The reported method provides a reliable tool for rapid and minimally invasive monitoring of individual PE risk and sheds new light on maternal and neonatal healthcare (Figure 3).